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Abstract — We address high dimensional covariance estima- 
tion for elliptical distributed samples, which are also known 
as spherically invariant random vectors (SIRV) or compound- 
Gaussian processes. Specifically we consider shrinkage methods 
that are suitable for high dimensional problems with a small 
number of samples (large p small n). We start from a classical 
robust covariance estimator [Tyler(1987)], which is distribution- 
free within the family of elliptical distribution but inapplicable 
when n < p. Using a shrinkage coefficient, we regularize Tyler's 
fixed point iterations. We prove that, for all n and p, the proposed 
fixed point iterations converge to a unique limit regardless 
of the initial condition. Next, we propose a simple, closed- 
form and data dependent choice for the shrinkage coefficient, 
which is based on a minimum mean squared error framework. 
Simulations demonstrate that the proposed method achieves low 
estimation error and is robust to heavy-tailed samples. Finally, as 
a real world application we demonstrate the performance of the 
proposed technique in the context of activity/intrusion detection 
using a wireless sensor network. 

Index Terms — Covariance estimation, large p small n, shrink- 
age methods, robust estimation, elliptical distribution, activ- 
ity/intrusion detection, wireless sensor network 



I. Introduction 

Estimating a covariance matrix (or a dispersion matrix) 
is a fundamental problem in statistical signal processing. 
Many techniques for detection and estimation rely on accurate 
estimation of the true covariance. In recent years, estimating 
a high dimensional p x p covariance matrix under small 
sample size n has attracted considerable attention. In these 
"large p small n" problems, the classical sample covariance 
suffers from a systematically distorted eigen-structure j6j|, and 
improved estimators are required. 

Much effort has been devoted to high-dimensional co- 
variance estimation, which use Steinian shrinkage U-O) or 
other types of regularized methods such as J4), 0. However, 
most of the high-dimensional estimators assume Gaussian 
distributed samples. This limits their usage in many important 
applications involving non-Gaussian and heavy-tailed sam- 
ples. One exception is the Ledoit-Wolf estimator 0, where 
the authors shrink the sample covariance towards a scaled 
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identity matrix and proposed a shrinkage coefficient which is 
asymptotically optimal for any distribution. However, as the 
Ledoit-Wolf estimator operates on the sample covariance, it 
is inappropriate for heavy tailed non-Gaussian distributions. 
On the other hand, traditional robust covariance estimators 
ll8l- lfT0l designed for non-Gaussian samples generally require 
n 3> p and are not suitable for "large p small n" problems. 
Therefore, the goal of our work is to develop a covariance 
estimator for problems that are both high dimensional and 
non-Gaussian. In this paper, we model the samples using the 
elliptical distribution JTJ, which is also referred to as the 
spherically invariant random vector model (SIRV) 1261 . Il27l 
or the compound-Gaussian process model 11311 . As a flexible 
and popular alternative, the elliptical family encompasses a 
large number of important distributions such as Gaussian 
distribution, the multivariate Cauchy distribution, the mul- 
tivariate exponential distribution, the multivariate Student-T 
distribution, the K-distribution and the Weibull distribution. 
The capability of modelling heavy-tails makes the elliptical 
distribution appealing in signal processing and related fields. 
Typical applications include radar detection lfl3l . ifTTl . 1201 . 
El . speech signal processing l23l . remote sensing 1241 . 
wireless fading channels modelling 1271 . financial engineering 
ll25l and so forth. 

A well-studied covariance estimator for elliptical distri- 
butions is the ML estimator based on normalized samples 
(3, 03), fl6l . The estimator is derived as the solution to 
a fixed point equation by using fixed point iterations. It is 
distribution-free within the class of elliptical distributions and 
its performance advantages are well known in the n 3> p 
regime. However, it is not suitable for the "large p small 
77" setting. Indeed, when n < p, the ML estimator as 
defined does not even exist. To avoid this problem the authors 
of IF2TI propose an iterative regularized ML estimator that 
employs diagonal loading and uses a heuristic procedure for 
selecting the regularization parameter. While they did not 
establish convergence and uniqueness ETI they empirically 
demonstrated that their algorithm has superior performance in 
the context of a radar application. Our approach is similar 
to ||2T1 but we propose a systematic procedure of selecting 
the regularization parameter and establish convergence and 
uniqueness of the resultant iterative estimator. Specifically, 
we consider a shrinkage estimator that regularizes the fixed 
point iterations. For a fixed shrinkage coefficient, we prove that 
the regularized fixed iterations converge to a unique solution 
for all ?i and p, regardless of the initial condition. Next, 
following Ledoit-Wolf |2], we provide a simple closed-form 
expression for the shrinkage coefficient, based on minimizing 
mean-squared-error. The resultant coefficient is a function 
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of the unknown true covariance and cannot be implemented 
in practice. Instead, we develop a data-dependent "plug- 
in" estimator approximation. Simulation results demonstrate 
that our estimator achieves superior performance for samples 
distributed within the elliptical family. Furthermore, for the 
case that the samples are truly Gaussian, we report very 
little performance degradation with respect to the shrinkage 
estimators designed specifically for Gaussian samples ]3). 

As a real world application we demonstrate the proposed 
estimator for activity/intrusion detection using an active wire- 
less sensor network. We show that the measured data exhibit 
strong non-Gaussian behavior and demonstrate significant 
performance advantages of the proposed robust covariance 
estimator when used in a covariance-based anomaly detection 
algorithm. 

The paper is organized as follows. Section II provides a 
brief review of elliptical distributions and of Tyler's covariance 
estimation method. The regularized covariance estimator is 
introduced and derived in Section III. We provide simulations 
and experimental results in Section IV and Section V, respec- 
tively. Section VI summarizes our principal conclusions. The 
proof of theorems and lemmas are provided in the Appendix. 

Notations: In the following, we depict vectors in lowercase 
boldface letters and matrices in uppercase boldface letters. ( ) T 
denotes the transpose operator. Tr(-) and dct(-) are the trace 
and the determinant of a matrix, respectively. 

II. ML COVARIANCE ESTIMATION FOR ELLIPTICAL 
DISTRIBUTIONS 

A. Elliptical distribution 

Let x be a p x 1 zero-mean random vector generated by the 
following model 

X = isu, (1) 

where v is a positive random variable and u is a p x 1 zero- 
mean, jointly Gaussian random vector with positive definite 
covariance S. We assume that v and u are statistically 
independent. The resulting random vector x is elliptically 
distributed and its probability density function (pdf) can be 
expressed by 

p(x) = 0(x T S- 1 x), (2) 



where </>(•) is the characteristic function (Definition 2, pp. 5, 
ll25l ) related to the pdf of v. The elliptical family encompasses 
many useful distributions in signal processing and related 
fields and includes: the Gaussian distribution itself, the K dis- 
tribution, the Weibull distribution and many others. As stated 
above, elliptically distributed samples are also referred to as 
Spherically Invariant Random Vectors (SIRV) or compound 
Gaussian processes in signal processing. 

B. ML estimation 

Let {x,}™ =1 be a set of n independent and identically dis- 
tributed (i.i.d.) samples drawn according to ([TJ. The problem is 
to estimate the covariance (dispersion) matrix S from 
The model (Q]i is invariant to scaling of the covariance matrix 
S of u. Therefore, without loss of generality, we assume that 



the covariance matrix is trace-normalized in the sense that 
Tr(S) = p. 

The commonly used sample covariance, defined as 



(3) 



is known to be a poor estimator of S, especially when the 
samples are high-dimensional (large p) and/or heavy-tailed. 
Tyler's method |9| addresses this problem by working with 
the normalized samples: 

s- - - (4) 
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for which the term v in (Q~|) drops out. The pdf of Si is given 
by E5) 

TCP/ 2 ) /TTTTvt^TTT f_T„-l_ 



P(s,;£) 



(5) 



27rP/ 2 

Taking the derivative and equating to zero, the maximum 
likelihood estimator based on {si}" =1 is the solution to 

n t 

S=g-V Jf\ . (6) 



When n > p, the ML estimator can be found using the 
following fixed point iterations: 



:1 S I^j S t 



(7) 



where the initial So is usually set to the identity matrix. 
Assuming that n > p and that any p samples out of {si}™ =1 
are linearly independent with probability one, it can be shown 
that the iteration process in (0 converges and that the limiting 
value is unique up to constant scale, which does not depend on 
the initial value of So- In practice, a final normalization step 
is needed, which ensures that the iteration limit Soo satisfies 
Tr(S 00 ) = p. 

The ML estimate corresponds to the Huber-type M- 
estimator and has many good properties when n 3> p, such as 
asymptotic normality and strong consistency. Furthermore, it 
has been pointed out |9 | that the ML estimate 01 is the "most 
robust" covariance estimator in the class of elliptical distri- 
butions in the sense of minimizing the maximum asymptotic 
variance. We note that 0) can be also motivated from other 
approaches as proposed in fl4l . Ifl6l . 

III. Robust shrinkage covariance estimation 

Here we extend Tyler's method to the high dimensional 
setting using shrinkage regularization. It is easy to see that 
there is no solution to (|6]) when n < p (its left-hand-side is 
full rank whereas its right-hand-side of is rank deficient). This 
motivates us to develop a regularized covariance estimator for 
elliptical samples. Following 0, O, we propose to regularize 
the fixed point iterations as 



-i = (i-p)*£ 

71 ^ 
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Tr(S J+1 )/p 



(8) 
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where p is the so-called shrinkage coefficient, which is a 
constant between and 1 . When p = and n > p the proposed 
shrinkage estimator reduces to Tyler's unbiased method in 
(O and when p = 1 the estimator reduces to the trivial 
uncorrected case yielding a scaled identity matrix. The term 
pi ensures that Ej+i is always well-conditioned and thus 
allows continuation of the iterative process without the need 
for restarts. Therefore, the proposed iteration can be applied to 
high dimensional estimation problems. We emphasize that the 
normalization (0 is important and necessary for convergence. 
We establish provable convergence and uniqueness of the limit 
in the following theorem. 

Theorem 1. Let < p < 1 be a shrinkage coefficient. Then, 
the fixed point iterations in ((S]) and ((9]) converge to a unique 
limit for any positive definite initial matrix So. 

The proof of Theorem Q] follows directly from the concave 
Perron-Frobenius theory [28 ) and is provided in the Appendix. 
We note that the regularization presented in (0 and (0 is 
similar to diagonal loading 12T1 . However, unlike the diagonal 
loading approach of ||2T1| . the proposed shrinkage approach 
provides a systematic way to choose the regularization pa- 
rameter p, discussed in the next section. 

A. Choosing the shrinkage coefficient 

We now turn to the problem of choosing a good, data- 
dependent, shrinkage coefficient p, as as an alternative to 
cross-validation schemes which incur intensive computational 
costs. As in Ledoit-Wolf J2), we begin by assuming we 
"know" the true covariance X. Then we define the following 
clairvoyant "estimator": 



£(/>) = (i 



i=l 



+ pl, 



(10) 



where the coefficient p is chosen to minimize the minimum 
mean-squared error: 

2 
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arg min E 

p 



£(p) 
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The following theorem shows that there is a closed-form 
solution to the problem ( fTTT i. which we refer to as the "oracle" 
coefficient. 

Theorem 2. For i.i.d. elliptical distributed samples the solu- 
tion to ([77} is 

~ 2 ■ (1 - 2/p)Tr(£ 2 ) 



Po 



P 



(p 2 -np- 2n) + (n + 1 + 2(n - l)/p)Tr(E 2 ) ' 



(12) 

under the condition Tr(S) = p. 

The proof of Theorem [2] requires the calculation of the 
fourth moments of an isotropically distributed random vector 
ll30l - l32l and is provided in the Appendix. 

The oracle coefficient cannot be implemented since po is 
a function of the unknown true covariance S. Therefore, we 
propose a plug-in estimate for po- 

" 2 1 (l-2/p)Tr(M 2 ) 



P 



P 



(p 2 - np - 2n) + (n + 1 + 2(n - l)/p)Tr(M 2 ) ' 



(13) 



where M can be any consistent estimator of S, e.g., the 
trace-normalized Ledoit-Wolf estimator. Another appealing 
candidate for plug-in is the (trace-normalized) normalized 
sample covariance R lTT2l defined by: 



R 



n t—' 



(14) 



We note that the only requirement on the covariance estimator 
M is that it provide a good approximation to Tr(S 2 ). It does 
not have to be well-conditioned nor does it have to be an 
accurate estimator of the true covariance matrix X. 

By using the plug-in estimate p in place of p, the robust 
shrinkage estimator is computed via the fixed point iteration 
in © and (0. 

IV. Numerical simulation 

In this section we use simulations to demonstrate the su- 
perior performance of the proposed shrinkage approach. First 
we show that our estimator outperforms other estimators for 
the case of heavy-tailed samples generated by a multivariate 
Student-T distribution, where v in (0 is a function of a Chi- 
square random variable: 



(15) 



The degree-of-freedom d of this multivariate Student-T statis- 
tic is set to 3. The dimensionality p is chosen to be 100 and 
we let S be the covariance matrix of an AR(1) process, 




£(m) 



-j l 



(16) 



where denotes the entry of X in row i and column 

j, and r is set to 0.7 in this simulation. The sample size n 
varies from 5 to 225 with step size 10. All the simulations are 
repeated for 100 trials and the average empirical performance 
is reported. 

We use ( TT3l with M = R and employ iterations defined by 
(0 and (0 with p = p. For comparison, we also plot the results 
of the trace-normalized oracle in (fl2l i. the trace-normalized 
Ledoit-Wolf estimator 13, and the non-regularized solution in 
(0 (when n > p). As the Ledoit-Wolf estimator operates on 
the sample covariance which is sensitive to outliers, we also 
compare to a trace-normalized version of a clairvoyant Ledoit- 
Wolf estimator implemented according to the procedure in Q 
with known v. More specifically, the samples Xj are firstly nor- 
malized by the known realizations yielding truly Gaussian 
samples; then the sample covariance of the normalized x,'s is 
computed, which is used to estimate the Ledoit-Wolf shrinkage 
parameters and estimate the covariance via equation (14) in 
ID. The MSE are plotted in Fig.[T] It can be observed that the 
proposed method performs significantly better than the Ledoit- 
Wolf estimator, and the performance is very close to the ideal 
oracle estimator using the optimal shrinkage parameter ( fT2l ). 
Even the clairvoyant Ledoit-Wolf implemented with known 
Vi does not outperform the proposed estimator in the small 
sample size regime. These results demonstrate the robustness 
of the proposed approach. 
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As a graphical illustration, in Fig. [2] we provide covariance 
visualizations for a realization of the estimated covariances 
using the Ledoit-Wolf method and the proposed approach. The 
sample size in this example is set to 50, which is smaller than 
the dimension 100. Compared to the true covariance, it is clear 
that the proposed covariance estimator preserves the structure 
of the true covariance, while in the Ledoit-Wolf covariance 
procudure produces "block pattern" artifacts caused by heavy- 
tails of the multivariate Student-T. 

When n > p, we also observe a substantial improvement of 
the proposed method over the ML covariance estimate, which 
provides further evidence of the power of Steinian shrinkage 
for reducing MSE. 
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Fig. 1. Multivariate Student-T samples: Comparison of different trace- 
normalized covariance estimators when p = 100. 





(a) Ledoit-Wolf 



(b) Proposed 



Fig. 2. Multivariate Student-T samples: Visualizations of two estimates using 
the Ledoit-Wolf and the proposed approaches, p = 100, n = 50. Note that 
n < p in this case. 

In order to assess the tradeoff between accuracy and ro- 
bustness we investigate the case when the samples are truly 
Gaussian distributed. We use the same simulation parameters 
as in the previous example, the only difference being that 
the samples are now generated from a Gaussian distribution. 
The performance comparison is shown in Fig. [5] where four 
different (trace-normalized) methods are included: the oracle 
estimator derived from Gaussian assumptions (Gaussian or- 
acle) [0, the iterative approximation of the Gaussian oracle 
(Gaussian OAS) proposed in J3), the Ledoit-Wolf estimator 
and the proposed method. It can be seen that for truly Gaussian 
samples the proposed method performs very closely to the 
Gaussian OAS, which is specifically designed for Gaussian 



distributions. Indeed, for small sample size (n < 20), the 
proposed method performs even better than the Ledoit-Wolf 
estimator. This indicates that, although the proposed robust 
method is developed for the entire elliptical family, it actually 
sacrifices very little performance for the case that the distri- 
bution is Gaussian. 
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Fig. 3. Gaussian samples: Comparison of trace-normalized different covari- 
ance estimators when p = 100. 



V. Application to anomaly detection in wireless 

SENSOR NETWORKS 

In this section we demonstrate the proposed robust covari- 
ance estimator in a real application: activity detection using a 
wireless sensor network. 

The experiment was set up on an Mica2 sensor network 
platform, as shown in Fig. |H which consists of 14 sensor 
nodes randomly deployed inside and outside a laboratory at 
the University of Michigan. Wireless sensors communicated 
with each other asynchronously by broadcasting an RF signal 
every 0.5 seconds. The received signal strength (RSS), de- 
fined as the voltage measured by a receiver's received signal 
strength indicator circuit (RSSI), was recorded for each pair of 
transmitting and receiving nodes. There were 14 x 13 = 182 
pairs of RSSI measurements over a 30 minute period, and 
samples were acquired every 0.5 sec. During the experiment 
period, persons walked into and out of the lab at random times, 
causing anomaly patterns in the RSSI measurements. Finally, 
for ground truth, a web camera was employed to record the 
actual activity. 




Fig. 4. Experimental platform: wireless Mica2 sensor nodes. 
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Fig. [5] shows all the received signals and the ground truth 
indicator extracted from the video. The objective of this 
experiment was intrusion (anomaly) detection. We emphasize 
that, with the exception of the results shown in Fig. [lOl the 
ground truth indicator is only used for performance evaluation 
and the detection algorithms presented here were completely 
unsupervised. 



by a Gaussian distribution. As additional evidence, we fitted a 
Student-T distribution to the first detrended RSS sequence, and 
used maximum likelihood to estimate the degree-of-freedom 
as d = 2 with a 95% confidence interval (CI) [1.8460, 2.2879]. 
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Fig. 6. QQ plot of data versus the standard Gaussian distribution. 



Fig. 5. At bottom 182 RSS sequences sampled from each pair of transmitting 
and receiving nodes in intrusion detection experiment. Ground truth indicators 
at top are extracted from video captured from a web camera that recorded the 
scene. 

To remove temperature drifts O of receivers we detrended 
the data as follows. Let Xi[k] be the k-th sample of the z-th 
RSS signal and denote 

x[fc] = (x 1 [k] 1 x 2 [k],...,x W2 [k]) T - (17) 
The local mean value of x.[k] is defined by 

^ fe+m 

*w = 2^+i £ *[*]■ (18) 

i—k — rn 

where the integer m determines local window size and is set 
to 50 in this study. We detrend the data by subtracting this 
local mean 

y[k] = x[fc]-x[fc], (19) 

yielding a detrended sample y[k] used in our anomaly detec- 
tion. 

We established that the detrended measurements were 
heavy-tailed non-Gaussian by performing several statistical 
tests. First the Lilliefors test IT371 of Gaussianity was performed 
on the detrended RSS measurements. This resulted in rejection 
of the Gaussian hypothesis at a level of significance of 10 -6 . 
As visual evidence, we show the quantile-quantile plot (QQ 
plot) for one of the detrended RSS sequences in Fig. [6] which 
illustrates that the samples are non-Gaussian. In Fig. [21 we plot 
the histograms and scatter plots of two of the detrended RSS 
sequences, which shows the heavy-tail nature of the sample 
distribution. This strongly suggests that the RSS samples can 
be better described by a heavy-tailed elliptical distribution than 
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Fig. 7. Histograms and scatter plots of the first two de-trended RSS 
sequences, which are fit by a multivariate Student-T distribution with degree- 
of-freedom d = 2. 

Consider the following function of the detrended data: 

t fc =y[fc] T E- 1 y[fc]- (20) 

for known S = E {y[fc]y[/c] T }, tk is a statistic that has been 
previously proposed for anomaly detection 13311 . A time sam- 
ple is declared to be anomalous if the test statistic tk exceeds 
a specified threshold. We then applied our proposed robust 
covariance estimator to estimate the unknown £ and imple- 
mented ( f20b for activity detection. Specifically, we constructed 
the 182 x 182 sample covariance by randomly subsampling 200 
time slices from the RSS data shown in Fig. [5] Note, that these 
200 samples correspond to a training set that is contaminated 
by anomalies at the same anomaly rate (approximately 10%) as 
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the entire sample set. The detection performance was evaluated 
using the receiver operating characteristic (ROC) curve, where 
the averaged curves from 200 independent Monte-Carlo trials 
are shown in Fig. [8] For comparison, we also implemented 
the activity detector ( f20b with other covariance estimates 
including: the sample covariance, the Ledoit-Wolf estimator 
and Tyler's ML estimator. 
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Fig. 8. Performance comparison for different covariance estimators, p = 
182, n = 200. 

From the mean ROCs we can see that the detection perfor- 
mances are rank ordered as follows: Proposed > Ledoit-Wolf 
> Tyler's ML > Sample covariance. The sample covariance 
performs poorly in this setting due to the small sample size 
(n = 200, p = 182) and its sensitivity to the heavy-tailed 
distribution shown in Fig. [6] and Q The Tyler ML method 
and the Ledoit-Wolf estimator improve upon the sample co- 
variance since they compensate for heavy tails and for small 
sample size, respectively. Our proposed method compensates 
for both effects simultaneously and achieves the best detection 
performance. 

We also plot the 90% confidence envelopes, determined 
by cross-validation, on the ROCs in Fig. [9] The width of 
the confidence interval reflects the sensitivity of the anomaly 
detector to variations in the training set. Indeed, the upper and 
lower endpoints of the confidence interval are the optimistic 
and the pessimistic predictions of detection performance. The 
proposed method achieves the smallest width among the four 
computed 90% confidence envelopes. 

Finally, for completeness we provide performance com- 
parison of covariance-based supervised activity detection al- 
gorithms in Fig. [10] The training period is selected to be 
[251,450] based on ground truth where no anomalies appear. 
It can be observed that, by excluding the outliers caused by 
anomalies, the performance of the Ledoit-Wolf based intrusion 
detection algorithm is close to that of the proposed method. 
We conclude that the activity detection performance of the 
proposed covariance estimator is more robust than the other 
three estimators with respect to outlier contamination in the 
training samples. 
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Fig. 9. Performance comparison for different covariance estimators, including 
the mean value and 90% confidence intervals, (a): Sample covariance. (b): 
Proposed, (c): Ledoit-Wolf. (d): Tyler's ML. The 200 training samples are 
randomly selected from the entire data set. 




Fig. 10. Performance comparison for different covariance estimators, p = 
182, n = 200. The covariance matrix is estimated in a supen'ised manner. 

VI. Conclusion 

In this paper, we proposed a shrinkage covariance estimator 
which is robust over the class of elliptically distributed sam- 
ples. The proposed estimator is obtained by fixed point itera- 
tions, and we established theoretical guarantees for existence, 
convergence and uniqueness. The optimal shrinkage coefficient 
was derived using a minimum mean-squared-error framework 
and has a closed-form expression in terms of the unknown true 
covariance. This expression can be well approximated by a 
simple plug-in estimator. Simulations suggest that the iterative 
approach converges to a limit which is robust to heavy-tailed 
multivariate Student-T samples. Furthermore, we show that 
for the Gaussian case, the proposed estimator performs very 
closely to previous estimators designed expressly for Gaussian 
samples. 

As a real world application we demonstrated the perfor- 
mance of the proposed estimator in intrusion detection using 
a wireless sensor network. Implementation of a standard 



7 



covariance-based detection algorithm using our robust covari- 
ance estimator achieved superior performances as compared 
to conventional covariance estimators. 

The basis of the proposed method is the ML estimator 
originally proposed by Tyler in (9|. However, the approach 
presented in this paper can be extended to other M-estimators. 

One of the main contributions of our work is the proof 
of uniqueness and convergence of the estimator. This proof 
extends the results of (9), lfl8l to the regularized case. Re- 
cently, an alternative proof to the non-regularized case using 
convexity on manifolds was presented in l35l . This latter proof 
highlights the geometrical structure of the problem and gives 
additional insight. We are currently investigating its extension 
to the regularized case. 
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VIII. Appendix 

A. Proof of Theorem Q] 

In this appendix we prove Theorem Q] The original con- 
vergence proof for the non-regularized case in (9), lfl8l is 
based on careful exploitation of the specific form of ©. In 
contrast, our proof for the regularized case is based on a direct 
connection from the concave Perron-Frobenius theory l28l . 
11291 that is simpler and easier to generalize. We begin by 
summarizing the required concave Perron-Frobenius result in 
the following lemma. 



Lemma 1 ( 11281 ). Let (E, ||-||) be a Banach space with K C E 
being a closed, convex cone on which || • || is increasing, i.e., for 
which x < y implies \\x\\ < \\y\\, where the operator < on the 
convex cone K means that if x < y then y — x G K. Define 
U = {x\x G K, \ \x\ \ = 1}. Let T : K K be a concave 
operator such that 

T(px + (1 - p)y) > pT(x) + (1 - p)T(y), 
for all p G [0, 1], all x, y G K. 

If for some e G K — {0} and constants a > 0, b > there is 

ae < T(x) < be, for all x G U, (22) 

then there exists a unique x* G U to which the iteration of 
the normalized operator T(x) = T(x)/\\T(x)\\,x G K — {0} 
converges: 



(21) 



lim f k (x) 



for all x G K - {0}. 



(23) 



Lemma Q] can be obtained by combining results from 
Lemma 2 and Theorem in Section 4 of fl28l . Here we show that 
the proof of Theorem Q] is a direct result of applying Lemma 
Q] with proper definitions of E, K, U and T: 

• E: the set of all symmetric matrices; 

> K: the set of all positive semi-definite matrices; 



IS II: the normalized nuclear norm of S, i.e., 



Ail, 



3=1 



(24) 



where Xj is the j-th eigenvalue of S and | ■ | is the absolute 
value operator. Note that for any S G K, the nuclear 
norm || • || is identical to Tr(-)/p and is increasing; 
U: the set U = {S|S G K, ||S|| = 1}; 
T: the mapping from K to K defined by 



T(S) = (l-p)^> W (s 4 ,S)s A J +pl, 



where the weight function w(s i7 H) is defined as 



w(si, S) = inf 



s T Sz 



(25) 



(26) 



z r "s7#0 (sf Z) 2 ' 

for any S G K. 

Proof: With the above definitions we show that Theorem 
Q] is a direct result of Lemma Q] We begin by showing that 
the mapping operator T is concave. Indeed, it is sufficient to 
show that w(si, S) in concave in S, which is true because it 
is the infinimum of affine functions of S. 

Next, we show that T satisfies condition (l22l with e = I. 
It is easy to see that 

pi < T(S), (27) 
for any S G U. Then we show that 

w( Si ,-£)<p, (28) 

for any S G U. Indeed, 

r ■ z T Sz sfSsi A max 

w ( s ii S) = ml - ^ < — -= — — < — = — = A max , 
z^ Sl #o(sfz) 2 (sfs,) 2 sfs, 

(29) 

where A max is the maximum eigenvalue of S. The last equality 
in the right-hand-side of d29t comes from the fact that is of 
unit norm by definition ©. d28l is thus obtained by noticing 
that S G U and A max < p. Substituting (l28l into dZSb we have 

T(S) < (1 - p)p 2 R + pl< ((1 - p)p 2 a max + p) I, (30) 

where 



R 



1 



^ ] S i S i ) 



and a max is the maximum eigenvalue of R. Again, as Sj is 
of unit norm, a max < Tr(R) = 1 and 



T(S) < ((1 - p)p 2 + p) I. 



(31) 



Therefore, we have shown that T satisfies condition ( l22l . 
where e = I, a = p and b = (1 — p)p 2 + p. In addition, 
(l22l establishes that the mapping T from U always yields a 
positive definite matrix. Therefore, the convergent limit of the 
fixed-point iteration is positive definite. 

Finally, we note that, for any S y 0, we have 

11=11 = 02) 



8 



and 



w(si,£) = j nf , 



z T £ S 



(33) 



z r Si #o (sfz) 2 sfS-^j' 

The limit d23l l is then identical to the limit of proposed 
iterations ^ and (|9]l for any £ >- 0. Therefore, Theorem 
Q] has been proved. 

B. Proof of Theorem [2] 

Proof: To ease the notation we define C as 



(34) and 



The shrinkage estimator in ( TTOb is then 

S(p) = (l-p)C + pI. 



(35) 



By substituting ( UBl into ( TTOb and taking derivatives of p, we 
obtain that 



Po 



s{Tr((I-C)(S-C))} 











I-C 


:} 










- 77112 


+ Tr(E) 


771 2 ^ 


2t71h 


fp 



(36) 



where 



7712 = £{Tr(C 2 )}, 
mn =s{Tr(C)), 



and 



77112 = E 



{Tr(CS)} 



(37) 
(38) 

(39) 



Next, we calculate the moments. We begin by eigen- 
decomposing S as 



and denote 
Then, we define 



s = uDir , 

A = UD 1/2 . 



\A- l s. 



(40) 
(41) 

(42) 



' 2 



H 2 



Noting that is a Gaussian distributed random vector with 
covariance X, it is easy to see that ||zi|| 2 = 1 and and Zj 
are independent with each other for i ^ j. Furthermore, z$ is 
isotropically distributed ED-ES and satisfies 0, ED 



£{z iZ f} = -I, 



{(^Dz,) 2 } 



1 



Pip + 2) 
1 

p(p + 2) 



(2Tr(D 2 ) + Tr 2 (D)) 
(2Tr(S 2 ) +Tr 2 (S)) , 



(43) 



(44) 



and 



£;{(zfDz,) 2 } = lTr(D 2 ) = -iTr(S 2 ), (45) 



Expressing C in terms of z$, there is 

n 

C = Vz. t zf A T . (46) 

71 

i=l 

Then, 

n 

S { e } = S A E S { z » z nA T = S, (47) 

and accordingly we have 

m u =£{Tr(C)} = Tr(E), (48) 

E|Tr(CS)| = Tr(S 2 ). (49) 



m 12 = 
For m 2 there is 

,2 



A E Zjzf A T A ^ zjzJA 7 



i=i 



3=1 



«=1 3=1 



(50) 



^2^ i Tr ( E E ^zf Dz.zjD 

=1 3=1 



2 n n 

= ^EE^{(^) 2 }- 

i=i j=i 

Now substitute (04)) and (05]) to d50b : 



m 2 



pi n 



ri 2 \p(p + 2) 



(2Tr(E 2 ) + Tr 2 (S)) + 



i(n - 1) 



Tr(S ) 



-^ M (2 Tr ( S 2 ) + T r 2 (E ))+ (l-I)Tr(E 2 ) 
1 . 2 \ „ ,_ 2n , Tr 2 (E) 



n n(l + 2/p) 



Tr(E 2 ) + 



n(l + 2/p) ' 



(51) 

Recalling Tr(S) = p, (fT2l is finally obtained by substituting 
(ED, d49]l and (EB into ([36]). 
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